SDA 3.5 Documentation for RECODE
NAME
recode - recode variables
USAGE
recode -b filename
DESCRIPTION
RECODE uses one or more existing variables as input to create a
new SDA variable.
Ordinarily this program is invoked by the Web interface for the
SDA programs, and the user does not have to deal with the
keywords given in this document. Output from the program is in
HTML, which can be viewed with a Web browser. Users who run this
program interactively should see the
online help document.
It is also possible to run the program directly by preparing a
command file, which specifies the variables to be analyzed and
the options to use. This document explains how to prepare such a
file. The name of this batch command file is specified to the
program after the ‘-b’ option flag.
BATCH FILE LAYOUT
The batch file is laid out in separate parts, separated by
asterisks (*). The parts can be given in any order.
- Definitions of the input and output variables.
- Rules or "map" for recoding the input variables into the new
output variable.
- Category labels for the new output variable (optional).
- Descriptive text for the new variable (optional).
Since the "map," category labels, and descriptive text can
have varying numbers of lines, each of those parts ends with an
asterisk (*) on a line by itself. The general layout is as
follows:
(Input and output definitions)
MAP=
(Recode map)
*
CATLABELS= [optional]
(Category text and labels)
*
TEXT= [optional]
(Descriptive text)
*
KEYWORDS FOR RECODE SPECIFICATIONS
The specifications are given in the form "keyword = something"
with one keyword per line. Keywords may be given in any order,
and the valid keywords are as follows (with significant
characters shown in capital letters):
Defining Input Variables
Keyword Possible Specification Default (if no keyword)
_____________________________________________________________________
STudies= path of source dataset(s) Look for input variables
only in current directory
INvars= name(s) of input var(s) REQUIRED
Defining the New Variable
Keyword Possible Specification Default (if no keyword)
_____________________________________________________________________
OUTSTudy= path of study for new variable Current directory
OUTVar= name of new variable REQUIRED
LABEL= long label for new variable No long label
CATlabels= (precedes lines of category No category text
text - see details below) or labels
MAP= (precedes lines with recode REQUIRED
map or rules - see below)
MD= list of invalid codes, ranges No defined MD codes
(also used for output value
if input has missing data
-- see below)
MIN= minimum valid code No defined minimum
MAX= maximum valid code No defined maximum
OVERwrite= yes Do not overwrite new var
if it already exists
OTHercases= name of the input variable Set to MD code
from which to take the value (or system-missing)
for cases that do not match
a pattern in the MAP
TEXT= (precedes lines of descriptive No item text
text - see details below)
Other options
Keyword Possible Specification Default (if no keyword)
_____________________________________________________________________
DIAGnostics= yes No diagnostic summary of
the new variable
COLorcoding= yes No colored headings in the
diagnostic output
GVARCase= LOWER or UPPER Do not convert all variable
names to lower/upper case
LAnguagefile= Name of file with non-English English labels on
labels and messages output
SAVebatch= name of directory No file preserved with batch
commands to create new var
(for interactive version)
The batch file name is the
name of the new variable,
with the suffix ’.rec’
ABBREVIATIONS AND REPETITIONS
Most keywords can be abbreviated. Usually only two or three
characters are required. The keyword for the category text for
the new variable, for instance, can be given as "catlabels=" or
"catlab=" or even "cat=". Either upper or lower case may be
used. If keywords are repeated, the second specification will
override the first.
COMMENTS
Anything on a line beginning with "#" is ignored by the batch
processor and can therefore be used for comments. Blank lines
are also ignored.
RECODE MAP
The rules for combining the values of one or more input variables
into a value on the output variable are contained in the recode
map. First put the MAP keyword on a line by itself; then put
each recode rule on a separate line. The general format is as
follows:
New value: values on var 1 [; values on var 2; ... ]
The recode rules for different input variables are separated by a
semicolon (;). After the last rule, put an asterisk (*) on a
line by itself.
For example, to recode age and gender into 4 categories (younger
male, younger female, older male, older female), one could
construct the following recode map:
map=
1: 18-49; 1
2: 18-49; 2
3: 50-97; 1
4: 50-97; 2
*
Each recode rule can include more than one value or range for
each input variable. A single asterisk (*) in a recode rule
matches any VALID value of the corresponding input variable. Two
asterisks (**) match ANY value, including missing-data (both
user-defined and system-missing) and out-of-range values. It is
possible to have more than one rule for a given output value --
notice that the output code 4 has three rules in the example
given below.
map=
1: 1,3-5,7 ; 1-10
2: 1,3-5,7 ; 11-50
3: 1,3-5,7 ; 51-90
4: 8-10,12 ; *
4: 41,45,55; 11-90
4: 61-90 ; *
9: ** ; **
*
If a case matches more than one recode rule, the first rule
encountered will apply. Notice in this example that the recode
rule ‘**; **’ matches all values of the two input variables; any
cases not covered by a rule higher up in the map will receive the
value 9.
CASES UNMATCHED BY THE RECODE MAP
If a case does not match any of the recode rules the output
variable can take on one of several values, depending on the
options that were specified.
- If the ‘OTHercases=’ keyword was specified, that case will
be assigned the value of the variable specified after that
keyword.
- If the ‘OTHercases=’ keyword was NOT specified, the case
will be assigned the value specified with the ‘MD=’ keyword. If
more than one MD value was specified, the first MD value is used
for this purpose. Note that all values mentioned after the ‘MD=’
keyword are flagged as missing-data in the new variable.
- If neither the ‘OTHercases=’ keyword nor the ‘MD=’ keyword
has been specified, that case will be assigned the system-missing
value.
CATEGORY TEXT AND LABELS
Category text and labels for one or more codes of the new
variable can be supplied. First put the ‘CATlabels=’ keyword on
a line by itself; then specify on a separate line each code,
followed by one or more spaces or tabs, then the category text
[and short label, if desired]. (Programs such as TABLES and
MEANS will use the short label for a category, if one is
available.) Put an asterisk (*) on a line by itself after the
last label. For example:
catlabels=
1 Professional and technical [Prf,Tech]
2 Managers
3 Blue collar workers [Blue Col]
4 Other
9 Missing
*
CHARACTER INPUT VALUES
Recode only works with NUMERIC variables, but it can handle
character values that have been defined as missing-data codes
(such as ‘D’ or ‘R’). One of the examples below illustrates this
application.
DESCRIPTIVE TEXT
Descriptive text may be stored with the new variable. This text
can then be displayed when the variable is used in analysis
programs or in a codebook. First put the ‘TEXT=’ keyword on a
line by itself; then write as many lines of text as you wish to
store with the new variable. Put an asterisk (*) on a line by
itself after the last line of text.
MULTIPLE RECODES
RECODE commands for more than one variable can be included in the
same batch file. After the first set of commands, put a line
beginning with two asterisks (**); then the commands for another
new variable can follow. The value of the ‘STudies=’ keyword is
carried over from the previous set of commands, unless it is
respecified.
BACKWARD COMPATIBILITY
RECODE can read most older CSA recode commands. The following
keywords are still recognized and are equivalent to the new
keywords shown in parentheses:
- longlabel (label)
- labels (catlabels),
The missing-data keywords ‘md1=value1’ and ‘md2=value2’ are
also recognized and are equivalent to the new form: ‘md= value1,
value2’.
Note, however, that in the CSA recode rules, a single asterisk
(*) matches ALL values of an input variable. SDA distinguishes
between a single asterisk, which matches only the VALID values of
an input variable; and two asterisks, which match ALL values.
EXAMPLES OF BATCH FILES
1. Collapse age into 3 categories
study = /sda/testdata
invar = age
outvar = age3
label = Collapsed age - 3 categories
md = 9
map=
1: 18-29
2: 30-49
3: 50-97
*
catlabels=
1 <30
2 30-49
3 50+
9 missing
*
**
2. Recode age and gender into 4 categories
invars = age gender
outvar = agesex
label = Age-gender typology
overwrite = yes
md = 9
map=
1: 18-49; 1
2: 18-49; 2
3: 50-97; 1
4: 50-97; 2
*
catlabels=
1 Yng Male
2 Yng Feml
3 Old Male
4 Old Feml
9 Missing
*
text=
This variable is a four-category typology
of age and gender
*
**
3. Collapse highest and lowest values of age
study = /sda/testdata
invar = age
outvar = age2070
label = Collapsed age - 20-70
# Note the use of the ‘othercases=’ option;
# only the codes given in the map are changed.
othercases = age
# We want the previous MD codes of 99 to stay as MD
md = 99
map=
20: 1-20
70: 70-97
*
catlabels=
20 20 or younger
70 70 or older
*
**
4. Convert character missing data codes to numbers
invar = spend
outvar = numspend
label = Recoded spend variable
md = 8,9
map=
1: 1-2
2: 3
8: D
9: R
*
catlabels=
1 A lot
2 Not enough
8 Don’t know
9 Refused
*
**
CSM, UC Berkeley
April 12, 2011